This Script analyizes absorption and flouresence data from all observations in my study of C. Chamissoi
Scripts are sourced from the process_plate_run R script
and called upon in this Markdown. You must store the script in the
working directory as well as all input and output files. Go ahead and
pick the directory you want to use.
source("process_plate_run.R") #Contains all the functions I have built
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'zoo'
##
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
##
## Attaching package: 'plotly'
##
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
##
## The following object is masked from 'package:stats':
##
## filter
##
##
## The following object is masked from 'package:graphics':
##
## layout
##
##
## here() starts at G:/My Drive/ACES/Dissy/analysis/PE
##
## # Attaching packages: easystats 0.7.4
## ✔ bayestestR 0.16.0 ✔ correlation 0.8.7
## ✔ datawizard 1.1.0 ✔ effectsize 1.0.0
## ✔ insight 1.3.0 ✔ modelbased 0.11.0
## ✔ performance 0.14.0 ✔ parameters 0.26.0
## ✔ report 0.6.1 ✔ see 0.11.0
##
##
##
## Attaching package: 'ggpubr'
##
##
## The following objects are masked from 'package:datawizard':
##
## mean_sd, median_mad
WE FIRST LOAD THE INPUT FILES. Our project has three phycoerythrin and three complimentary fluoresence data sheets. We also have three spreadsheets for sample weights, and a datasheet for our replicates. We are going to upload all xlsx files and convert them into dataframe objects, retaining their file names using the load_excel_files function.
load_excel_files("Input PE")
## Finished loading 10 Excel files from G:/My Drive/ACES/Dissy/analysis/PE/Input PE
The tidy_all wrapper function and it’s subordinated
function tidy_and_correct processes spectral absorbency
data from a multi-wavelength microplate reader assay to quantify
phycoerythrin (PE) content in biological samples. It performs the
following key steps:
Data Import and Cleaning:
Loads raw data from an Excel file (second sheet), removes header rows,
and fills missing row identifiers for well labels.
Data Reshaping:
Organizes the spectral data from multiple wavelengths into a tidy format
where each row corresponds to a single well and each column to a
specific wavelength measurement.
Blank Correction:
Averages absorbance values from specified blank wells (e.g.,
A01, A02, A03) and subtracts
these values from all sample wells to correct for background
signal.
Quality Filtering:
Removes any samples with negative absorbance values at any measured
wavelength, tracking and reporting which samples were removed and for
which wavelength(s). Compares absorbance at X565 and X564.
FLUORESENCE AND ABSORBANCE ANALYSIS
# Define blank sample wells for each experimental run
blanks1 <- "A01"
blanks2 <- c("A01", "A02", "A03")
blanks3 <- c("H09", "H10", "H11")
# Create named lists of raw dataframes and corresponding blanks
listPE <- list(PE1 = PE1, PE2 = PE2, PE3 = PE3)
listFluor <- list(Fluor1 = Fluor1, Fluor2 = Fluor2, Fluor3 = Fluor3)
listblank <- list(blanks1, blanks2, blanks3)
# Clean and structure all datasets using a custom `tidy_all()` function
tidy_all(listPE, listblank)
## Processing: PE1 with blanks: A01
## Removed rows due to negative values:
## [1] "A01" "C01" "D07" "E01" "F01" "F07" "G01" "H07"
## Saved: PE1_tidy to global environment.
## Processing: PE2 with blanks: A01, A02, A03
## Removed rows due to negative values:
## [1] "A02" "A03" "A07" "A09" "A11" "B03" "D07" "G02" "G03" "G07"
## Saved: PE2_tidy to global environment.
## Processing: PE3 with blanks: H09, H10, H11
## Removed rows due to negative values:
## [1] "D07" "D08" "G01" "H11"
## Saved: PE3_tidy to global environment.
tidy_all(listFluor, listblank)
## Processing: Fluor1 with blanks: A01
## Saved: Fluor1_tidy to global environment.
## Processing: Fluor2 with blanks: A01, A02, A03
## Removed rows due to negative values:
## [1] "A02"
## Saved: Fluor2_tidy to global environment.
## Processing: Fluor3 with blanks: H09, H10, H11
## Removed rows due to negative values:
## [1] "E06" "E07" "E08" "E09" "E10" "E11" "E12" "F01" "F04" "F05" "F06" "F07"
## [13] "F08" "F09" "F10" "F11" "F12" "G02" "G03" "G04" "G06" "G07" "G08" "G09"
## [25] "G10" "G11" "G12" "H01" "H02" "H03" "H04" "H05" "H06" "H07" "H08" "H09"
## [37] "H10" "H12"
## Saved: Fluor3_tidy to global environment.
All tidied up now! ✅ ## 📊 Summary Statistics Table
# Create named summaries of cleaned fluorescence and absorbance data
summaries <- list(
Fluor1 = summary(Fluor1_tidy$Xred),
Fluor2 = summary(Fluor2_tidy$Xred),
Fluor3 = summary(Fluor3_tidy$Xred),
PE1 = summary(PE1_tidy$X565),
PE2 = summary(PE2_tidy$X565),
PE3 = summary(PE3_tidy$X565)
)
# Combine into a single summary table
all_stats <- unique(unlist(lapply(summaries, names)))
summary_table <- data.frame(
Statistic = all_stats,
do.call(cbind, lapply(summaries, function(s) {
s_named <- as.list(s)
sapply(all_stats, function(k) s_named[[k]] %||% NA)
}))
)
colnames(summary_table)[-1] <- names(summaries)
summary_table[,-1] <- round(summary_table[,-1], 2)
print(summary_table)
## Statistic Fluor1 Fluor2 Fluor3 PE1 PE2 PE3
## Min. Min. 0.00 0.00 6.33 0.00 0.00 0.00
## 1st Qu. 1st Qu. 5425.00 2583.00 8456.58 0.01 0.01 0.01
## Median Median 11356.00 6337.00 15958.83 0.01 0.01 0.01
## Mean Mean 15257.96 8108.97 22470.98 0.02 0.02 0.03
## 3rd Qu. 3rd Qu. 22447.25 10052.50 24916.33 0.02 0.02 0.03
## Max. Max. 47848.00 45258.00 84905.33 0.58 0.16 0.32
## NA's NA's NA NA 4.00 NA NA NA
# Compute difference between 565 and 564 nm absorbance for each PE dataset
PE_list <- list(PE1_tidy = PE1_tidy, PE2_tidy = PE2_tidy, PE3_tidy = PE3_tidy)
PE_list <- lapply(PE_list, function(df) {
df %>% dplyr::mutate(diff_absorbance = X565 - X564)
})
list2env(PE_list, envir = .GlobalEnv) # Save updated datasets to global environment
## <environment: R_GlobalEnv>
# For each tidy PE dataset, plot `diff_absorbance` by Cell_ID and save as PNG
plots <- list() # Initialize empty list to store plots
for (name in names(PE_list)) {
df <- PE_list[[name]]
p <- ggplot(df, aes(x = Cell_ID, y = diff_absorbance)) +
geom_bar(stat = "identity", fill = "skyblue") +
geom_text(aes(label = round(diff_absorbance, 3)), vjust = -0.5, size = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
labs(
title = paste("Difference between 565 and 564 Absorbance -", name),
x = "Sample (Cell_ID)",
y = "Difference (X565 - X564)"
)
# Save static plot as PNG (optional)
ggsave(paste0(name, "_diff_absorbance.png"), plot = p, width = 10, height = 6, dpi = 300)
# Store interactive plot in the list
plots[[name]] <- ggplotly(p)
}
# Display all interactive plots in your HTML output
htmltools::tagList(plots)
df_ratios <- PE2_tidy %>%
mutate(
`X455/X564` = X455 / X564,
`X592/X564` = X592 / X564,
Sample = row_number()
) %>%
pivot_longer(cols = c(`X455/X564`, `X592/X564`), names_to = "Ratio", values_to = "Value")
# Plot
ggplot(df_ratios, aes(x = Ratio, y = Value)) +
geom_boxplot() +
geom_jitter(width = 0.1, alpha = 0.5, color = "black") +
geom_hline(yintercept = 1.0, linetype = "dashed", color = "red") +
labs(
title = "Relative Absorbance: X455 and X592 Compared to X564",
y = "Absorbance Ratio",
x = "Absorbance Ratio Type"
) +
theme_minimal()
### Description of joindf_by_id Function
The joindf_by_id function takes two data frames
(df1 and df2) and merges them by matching
unique identifiers related to samples, specifically using either a
Cell_ID column or a plate well column. The key
steps and features are:
Cell_ID column and the
other contains a plate well column, which serve as the join
keys.Cell_ID column, that data frame is assigned as
the base (df_cell), and the other as the joining data
(df_plate).join_id) to facilitate the
join.output_name.
Additionally, unmatched rows are saved in a separate CSV file.#Set up new list of weights data to join
list_weights<- list(pe1_weights_id, pe2_weights_id, pe3_weights_id)
# Loop through both lists in parallel
mapply(function(df1, df2, name) {
output_file <- paste0(name, "_joined.csv")
joindf_by_id(df1, df2, output_file, key_df1 = "Cell_ID", key_df2 = "plate well")
}, df1 = PE_list, df2 = list_weights, name = names(listPE))
##
## Values needing repetition:
## Blank01 needs to be repeated
## Lab_08 needs to be repeated
## Lab_14 needs to be repeated
## Lab_16 needs to be repeated
## Lab_20 needs to be repeated
## Lab_22 needs to be repeated
## Lab_39 needs to be repeated
## Lab_45 needs to be repeated
## Join complete. Output saved to: PE1_joined.csv
##
## Values needing repetition:
## Blank02 needs to be repeated
## Blank03 needs to be repeated
## Lab_02 needs to be repeated
## Lab_03 needs to be repeated
## Lab_04 needs to be repeated
## Lab_14 needs to be repeated
## Lab_24 needs to be repeated
## Lab_26 needs to be repeated
## Join complete. Output saved to: PE2_joined.csv
##
## Values needing repetition:
## juice ilo needs to be repeated
## Lima R03 oven dry needs to be repeated
## Blank_06 needs to be repeated
## Join complete. Output saved to: PE3_joined.csv
## PE1_tidy
## result_df tbl_df,12
## merged_cells 88
## unmatched_cells 8
## unmatched_saved_to "unmatched_rows_PE1_joined.csv"
## assigned_to_global TRUE
## PE2_tidy
## result_df tbl_df,12
## merged_cells 86
## unmatched_cells 10
## unmatched_saved_to "unmatched_rows_PE2_joined.csv"
## assigned_to_global TRUE
## PE3_tidy
## result_df tbl_df,10
## merged_cells 88
## unmatched_cells 4
## unmatched_saved_to "unmatched_rows_PE3_joined.csv"
## assigned_to_global TRUE
STEP 4: CALCULATE PE CONTENT Calculating PE This
function takes a tidy data frame containing absorbance measurements and
calculates the concentration of phycoerythrin (PE) for each sample based
on the Beer & Eshel (1985) method. It performs the following
steps:
from Beer S, Eshel A (1985) Determining phycoerythrin and phycocyanin concentrations in aqueous crude extracts of red algae. Aust J Mar Freshwater Res 36:785–792, https://doi.org/10.1071/MF9850785.
Filtering Negative PE Values:
Samples with negative PE concentrations are identified and removed.
These removed samples are printed in the console with the reason for
removal.
Normalization to Sample Weight:
The PE concentration is converted from micrograms per milliliter (µg/mL)
to milligrams per gram of dry sample (mg/g) by adjusting for the extract
volume and the individual sample weights provided in the data frame.
This ensures accurate PE quantification based on the exact weight of
each sample rather than using a fixed default weight.
Output and Metadata:
The function returns a filtered data frame containing valid samples with
their calculated PE concentrations in mg/g. Additionally, it stores the
filtered-out rows (samples with negative PE) as an attribute called
"removed_rows_pe" for further inspection if
needed.
This process helps ensure the quality and accuracy of PE measurements by excluding invalid data and normalizing concentrations based on actual sample weights.
PE1_calc <- calculate_pe_and_filter(PE1_joined, sample_weight_col = "sample weight", sample_id_col="join_id")
## Removed observations due to negative PE values:
## # A tibble: 26 × 3
## join_id PE_mg_per_mL Removal_Reason
## <chr> <dbl> <chr>
## 1 A05 -0.000168 removed because PE < 0
## 2 A07 -0.0000960 removed because PE < 0
## 3 A11 -0.000144 removed because PE < 0
## 4 A12 -0.0000480 removed because PE < 0
## 5 B01 -0.0000960 removed because PE < 0
## 6 B05 -0.00106 removed because PE < 0
## 7 B09 -0.0000240 removed because PE < 0
## 8 C07 -0.0000240 removed because PE < 0
## 9 D10 -0.000216 removed because PE < 0
## 10 E04 -0.0000480 removed because PE < 0
## # ℹ 16 more rows
PE2_calc <- calculate_pe_and_filter(PE2_joined, sample_weight_col = "sample weight", sample_id_col="join_id")
## Removed observations due to negative PE values:
## # A tibble: 12 × 3
## join_id PE_mg_per_mL Removal_Reason
## <chr> <dbl> <chr>
## 1 A01 -0.000384 removed because PE < 0
## 2 A05 -0.00319 removed because PE < 0
## 3 D01 -0.0000480 removed because PE < 0
## 4 D03 -0.0000240 removed because PE < 0
## 5 E01 -0.000120 removed because PE < 0
## 6 E03 -0.0000960 removed because PE < 0
## 7 E04 -0.000552 removed because PE < 0
## 8 E06 -0.000144 removed because PE < 0
## 9 E07 -0.0000720 removed because PE < 0
## 10 E09 -0.0000480 removed because PE < 0
## 11 F01 -0.00175 removed because PE < 0
## 12 G04 -0.0000240 removed because PE < 0
PE3_calc <- calculate_pe_and_filter(PE3_joined, sample_weight_col = "sample weight", sample_id_col="join_id")
## Removed observations due to negative PE values:
## # A tibble: 12 × 3
## join_id PE_mg_per_mL Removal_Reason
## <chr> <dbl> <chr>
## 1 E09 -0.0000720 removed because PE < 0
## 2 E10 -0.0000720 removed because PE < 0
## 3 E12 -0.0000720 removed because PE < 0
## 4 F03 -0.000216 removed because PE < 0
## 5 F05 -0.0000240 removed because PE < 0
## 6 G02 -0.000120 removed because PE < 0
## 7 G04 -0.0000960 removed because PE < 0
## 8 G05 -0.0000720 removed because PE < 0
## 9 G12 -0.0000720 removed because PE < 0
## 10 H03 -0.0000480 removed because PE < 0
## 11 H09 -0.0000480 removed because PE < 0
## 12 H10 -0.0000240 removed because PE < 0
fluor_list <- list(
Fluor1 = Fluor1_tidy,
Fluor2 = Fluor2_tidy,
Fluor3 = Fluor3_tidy
)
library(ggplot2)
library(plotly)
library(htmltools)
plots <- list() # Initialize empty list to store interactive plots
for (name in names(fluor_list)) {
df <- fluor_list[[name]]
p <- ggplot(df, aes(x = Cell_ID, y = Xred)) +
geom_bar(stat = "identity", fill = "skyblue") +
geom_text(aes(label = round(Xred, 3)), vjust = -0.5, size = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
labs(
title = paste("530/25,590/35 -", name),
x = "Sample (Cell_ID)",
y = "Fluorescence"
)
# Save static version of plot (optional)
filename <- paste0(name, "_fluorescence.png")
ggsave(filename, plot = p, width = 10, height = 6, dpi = 300)
# Store the interactive version of the plot
plots[[name]] <- ggplotly(p)
}
# Display all interactive plots together in the HTML output
htmltools::tagList(plots)
Lets join this data to the PE data that has already been joined with the weights and id’s table.
joindf_by_id(PE1_calc, Fluor1_tidy, "pe_fluor1.csv", key_df1 = "join_id", key_df2 = "Cell_ID")
##
## Values needing repetition:
## A01 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A11 needs to be repeated
## A12 needs to be repeated
## B01 needs to be repeated
## B05 needs to be repeated
## B09 needs to be repeated
## C01 needs to be repeated
## C07 needs to be repeated
## D07 needs to be repeated
## D10 needs to be repeated
## E01 needs to be repeated
## E04 needs to be repeated
## E08 needs to be repeated
## E09 needs to be repeated
## E10 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## G01 needs to be repeated
## G02 needs to be repeated
## G03 needs to be repeated
## G04 needs to be repeated
## G05 needs to be repeated
## G06 needs to be repeated
## G07 needs to be repeated
## G09 needs to be repeated
## G10 needs to be repeated
## H05 needs to be repeated
## H07 needs to be repeated
## H08 needs to be repeated
## H12 needs to be repeated
## Join complete. Output saved to: pe_fluor1.csv
## $result_df
## # A tibble: 62 × 16
## join_id X455 X564 X592 X618 X645 X565 diff_absorbance ID
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 A02 0.03 0.018 0.012 0.012 0.009 0.017 -0.00100 Lab_01
## 2 A03 0.029 0.019 0.012 0.011 0.008 0.019 0 Lab_01
## 3 A04 0.022 0.013 0.009 0.009 0.007 0.013 0 Lab_01
## 4 A06 0.015 0.007 0.0050 0.00400 0.00200 0.0050 -0.00200 Lab_02
## 5 A08 0.015 0.009 0.006 0.0050 0.00300 0.008 -0.00100 Lab_03
## 6 A09 0.029 0.012 0.007 0.007 0.00300 0.01 -0.00200 Lab_03
## 7 A10 0.063 0.053 0.044 0.04 0.035 0.048 -0.00500 Lab_03
## 8 B02 0.018 0.008 0.0050 0.00400 0.00300 0.008 0 Lab_05
## 9 B03 0.028 0.016 0.013 0.012 0.01 0.016 0 Lab_05
## 10 B04 0.019 0.008 0.0050 0.00400 0.00300 0.008 0 Lab_05
## # ℹ 52 more rows
## # ℹ 7 more variables: `sample weight` <dbl>, `Tray well` <chr>, date <dttm>,
## # PE_mg_per_mL <dbl>, total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>, Xred <dbl>
##
## $merged_cells
## [1] 62
##
## $unmatched_cells
## [1] 34
##
## $unmatched_saved_to
## [1] "unmatched_rows_pe_fluor1.csv"
##
## $assigned_to_global
## [1] TRUE
joindf_by_id(PE2_calc, Fluor2_tidy, "pe_fluor2.csv", key_df1 = "join_id", key_df2 = "Cell_ID")
##
## Values needing repetition:
## A01 needs to be repeated
## A03 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A09 needs to be repeated
## A11 needs to be repeated
## B03 needs to be repeated
## D01 needs to be repeated
## D03 needs to be repeated
## D07 needs to be repeated
## E01 needs to be repeated
## E03 needs to be repeated
## E04 needs to be repeated
## E06 needs to be repeated
## E07 needs to be repeated
## E09 needs to be repeated
## F01 needs to be repeated
## G02 needs to be repeated
## G03 needs to be repeated
## G04 needs to be repeated
## G07 needs to be repeated
## Join complete. Output saved to: pe_fluor2.csv
## $result_df
## # A tibble: 74 × 16
## join_id X564 X592 X455 X565 diff_absorbance ID `sample weight`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 A04 0.0137 0.00967 0.0167 0.0127 -0.00100 Lab_… 24.5
## 2 A06 0.0103 0.00433 0.0163 0.0103 0 Lab_… 24.1
## 3 A08 0.00367 0.000667 0.0127 0.00367 0 Lab_… 23.7
## 4 A10 0.0103 0.00533 0.0213 0.00933 -0.00100 Lab_… 23
## 5 A12 0.00867 0.00267 0.0127 0.00867 0 Lab_… 26.9
## 6 B01 0.0163 0.00733 0.0213 0.0153 -0.00100 Lab_… 23.6
## 7 B02 0.00933 0.00533 0.0253 0.00933 0 Lab_… 24.2
## 8 B04 0.00467 0.00167 0.0117 0.00467 0 Lab_… 26.4
## 9 B05 0.0123 0.00733 0.0223 0.0113 -0.00100 Lab_… 24.4
## 10 B06 0.0643 0.0533 0.0763 0.0603 -0.00400 Lab_… 24.4
## # ℹ 64 more rows
## # ℹ 8 more variables: `Tray well` <chr>, date <dttm>, ...6 <lgl>, ...7 <chr>,
## # PE_mg_per_mL <dbl>, total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>, Xred <dbl>
##
## $merged_cells
## [1] 74
##
## $unmatched_cells
## [1] 21
##
## $unmatched_saved_to
## [1] "unmatched_rows_pe_fluor2.csv"
##
## $assigned_to_global
## [1] TRUE
joindf_by_id(PE3_calc, Fluor3_tidy, "pe_fluor3.csv", key_df1 = "join_id", key_df2 = "Cell_ID")
##
## Values needing repetition:
## E06 needs to be repeated
## E07 needs to be repeated
## E08 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F04 needs to be repeated
## F06 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## F09 needs to be repeated
## F10 needs to be repeated
## F11 needs to be repeated
## F12 needs to be repeated
## G03 needs to be repeated
## G06 needs to be repeated
## G07 needs to be repeated
## G08 needs to be repeated
## G09 needs to be repeated
## G10 needs to be repeated
## G11 needs to be repeated
## H01 needs to be repeated
## H02 needs to be repeated
## H04 needs to be repeated
## H05 needs to be repeated
## H06 needs to be repeated
## H07 needs to be repeated
## H08 needs to be repeated
## H12 needs to be repeated
## Join complete. Output saved to: pe_fluor3.csv
## $result_df
## # A tibble: 58 × 14
## join_id Xred X564 X565 X592 X455 diff_absorbance ID
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 A01 8004. 0.0193 0.0193 0.0123 0.0343 0 Lab_40
## 2 A02 7147. 0.0133 0.0133 0.00633 0.0253 0 Lab_40
## 3 A03 10370. 0.022 0.022 0.012 0.039 0 Lab_40
## 4 A04 20048. 0.0193 0.0193 0.00433 0.0243 0 Lab_41
## 5 A05 21192. 0.0283 0.0273 0.0123 0.0263 -0.00100 Lab_41
## 6 A06 22423. 0.0343 0.0333 0.0153 0.0503 -0.00100 Lab_41
## 7 A07 12791. 0.018 0.018 0.007 0.028 0 Lab_42
## 8 A08 14773. 0.0163 0.0163 0.00333 0.0233 0 Lab_42
## 9 A09 14830. 0.0223 0.0223 0.0103 0.0323 0 Lab_42
## 10 A10 37830. 0.0603 0.0593 0.0243 0.0403 -0.00100 Lab_43
## # ℹ 48 more rows
## # ℹ 6 more variables: `sample weight` <dbl>, `Tray well` <chr>, date <dttm>,
## # PE_mg_per_mL <dbl>, total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>
##
## $merged_cells
## [1] 52
##
## $unmatched_cells
## [1] 28
##
## $unmatched_saved_to
## [1] "unmatched_rows_pe_fluor3.csv"
##
## $assigned_to_global
## [1] TRUE
pe_fluor_all <- list(pe_fluor1, pe_fluor2, pe_fluor3)
# Add a run column based on list position
pe_fluor_all <- Map(function(df, i) {
df$run <- factor(paste0("run", i))
df
}, pe_fluor_all, seq_along(pe_fluor_all))
common_cols <- Reduce(intersect, lapply(pe_fluor_all, names))
# Keep only the common columns in each dataframe before binding
combined_df <- bind_rows(
lapply(pe_fluor_all, function(df) df[common_cols]),
.id = "run"
)
#Finally merge all spreadsheets into one for replicate testing and analysis
#Scale PE bigger for regression purposes
combined_df <- combined_df %>%
mutate(PE_scaled = PE_mg_per_g_sample * 1000)
1️⃣ Full Dataset Plot with Points Colored by Run
p_all <- ggplot(combined_df, aes(x = PE_mg_per_g_sample, y = Xred, color = run)) +
geom_point(alpha = 0.6) +
theme_minimal() +
labs(title = "Xred vs PE_mg_per_g_sample (All Runs)",
x = "PE (mg/g sample)",
y = "Xred Fluorescence") +
scale_color_brewer(palette = "Dark2")
ggsave("Xred_PE_all_runs.png", plot = p_all, width = 10, height = 6, dpi = 300)
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).
print(p_all)
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Remove unwanted rows first
combined_df <- combined_df[-(190:194), ]
# Add new factor level before assignment
levels(combined_df$run) <- c(levels(combined_df$run), "run 3 fresh")
# Assign new factor value to rows 169 to 189 (adjusted for new row count)
combined_df$run[169:189] <- "run 3 fresh"
2️⃣ Loop Through Each Run and Save Regressions
runs <- levels(combined_df$run)
for (r in runs) {
df_sub <- combined_df %>% filter(run == r)
if (nrow(df_sub) < 4) {
warning(paste("Skipping run", r, "- not enough data"))
next
}
#df_sub <- combined_df %>% filter(run == "run1")
# Fit models
model_linear <- lm(Xred ~ PE_scaled, data = df_sub)
model_log <- lm(Xred ~ log(PE_scaled + 0.001), data = df_sub)
model_poly <- lm(Xred ~ PE_scaled + I(PE_scaled^2), data = df_sub)
# Get annotation
ann_linear <- get_stats(model_linear, "Linear")
ann_log <- get_stats(model_log, "Log")
ann_poly <- get_stats(model_poly, "Poly")
# Plot
p <- ggplot(df_sub, aes(x = PE_scaled, y = Xred)) +
geom_point(alpha = 0.6, color = "black") +
geom_text_repel(aes(label = join_id), size = 2, max.overlaps = 50) +
stat_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "blue") +
stat_smooth(method = "lm", formula = y ~ log(x + 0.001), se = FALSE, color = "green", linetype = "dashed") +
stat_smooth(method = "lm", formula = y ~ x + I(x^2), se = FALSE, color = "red", linetype = "dotdash") +
annotate("text", x = Inf, y = Inf, label = ann_linear, hjust = 1.05, vjust = 2, color = "blue", size = 4) +
annotate("text", x = Inf, y = Inf, label = ann_log, hjust = 1.05, vjust = 3.5, color = "green", size = 4) +
annotate("text", x = Inf, y = Inf, label = ann_poly, hjust = 1.05, vjust = 5, color = "red", size = 4) +
theme_minimal() +
labs(title = paste("Regression Models: Xred vs PE (", r, ")"),
x = "PE (mg/g sample)",
y = "Xred Fluorescence")
# Save plot
filename <- paste0("Xred_PE_regressions_", r, ".png")
ggsave(filename, plot = p, width = 10, height = 6, dpi = 300)
plot(p)
}
## Warning: Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
## Warning: Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
analyze_replicates🔍 analyze_replicates() Function This function processes replicate data for each sample, calculates summary statistics, and optionally selects the best 3 replicates that minimize the coefficient of variation (CV) to reduce the effect of outliers or noisy data.
Key Features: Input: A data frame with replicate observations and associated metadata (e.g., sample weight, join_id, date, etc.).
Output: A summary table with per-sample means, standard deviations, standard errors, CVs, and maximum deviation percentages for each numeric variable.
Optional Filtering: When choose_best_3 = TRUE, the function:
Evaluates all combinations of 3 replicates,
Selects the combination with the lowest CV,
Computes statistics only on those 3 replicates.
Metadata Summary: Also calculates:
The number of replicates per sample,
Average sample weight,
Replication dates,
A list of included/excluded rows (if applicable).
Output File: A wide-format CSV file (e.g., replicate_analysis_summary.csv) containing one row per sample and all summary statistics.
This code chunk runs a function to analyze replicate measurements for each sample in combined_df, selecting the best 3 replicates based on consistency. It outputs summary statistics with the prefix “E_rep_analy” and updates the main dataframe. Then, it generates histograms with error bars for key variables (PE_ug_per_g, Xred) to visualize the distribution and variation across samples.
#combine technical replicates
analyze_replicates(combined_df,
id_col = "ID",
join_col = "join_id",
weight_col = "sample weight",
date_col = "date",
output_prefix = "all_rep_analy",
choose_best_3 = FALSE)
## Summary saved to: all_rep_analy_summary.csv
## # A tibble: 51 × 65
## ID PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 juice fres… 0.217 0.00276 217. 0.0243
## 2 juice fres… 0.362 0.00335 362. 0.0262
## 3 juice ilo 0.0613 0.000624 61.3 0.0103
## 4 Lab_01 0.0346 0.000326 34.6 0.0228
## 5 Lab_02 0.00228 0.0000360 2.28 0.0138
## 6 Lab_03 0.0328 0.000307 32.8 0.0282
## 7 Lab_04 0.0236 0.000372 23.6 0.0233
## 8 Lab_05 0.0132 0.000200 13.2 0.0292
## 9 Lab_06 0.0860 0.000869 86.0 0.173
## 10 Lab_07 0.0535 0.000562 53.5 0.0195
## # ℹ 41 more rows
## # ℹ abbreviated name: ¹PE_mg_per_g_sample_mean
## # ℹ 60 more variables: X564_mean <dbl>, X565_mean <dbl>, X592_mean <dbl>,
## # Xred_mean <dbl>, diff_absorbance_mean <dbl>, total_PE_mg_mean <dbl>,
## # PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>, PE_scaled_sd <dbl>,
## # X455_sd <dbl>, X564_sd <dbl>, X565_sd <dbl>, X592_sd <dbl>, Xred_sd <dbl>,
## # diff_absorbance_sd <dbl>, total_PE_mg_sd <dbl>, …
#combine technical replicates choose best three
analyze_replicates(combined_df,
id_col = "ID",
join_col = "join_id",
weight_col = "sample weight",
date_col = "date",
output_prefix = "E_rep_analy",
choose_best_3 = TRUE)
## Summary saved to: E_rep_analy_summary.csv
## # A tibble: 51 × 85
## ID PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 juice fres… 0.217 0.00276 217. 0.0243
## 2 juice fres… 0.362 0.00335 362. 0.0262
## 3 juice ilo 0.0613 0.000624 61.3 0.0103
## 4 Lab_01 0.0229 0.000392 22.9 0.027
## 5 Lab_02 0.00228 0.0000360 2.28 0.0138
## 6 Lab_03 0.0204 0.00044 20.4 0.0163
## 7 Lab_04 0.0236 0.000372 23.6 0.0233
## 8 Lab_05 0.00569 0.000136 5.69 0.0198
## 9 Lab_06 0.0641 0.00127 64.1 0.0523
## 10 Lab_07 0.0735 0.000616 73.5 0.0178
## # ℹ 41 more rows
## # ℹ abbreviated name: ¹PE_mg_per_g_sample_mean
## # ℹ 80 more variables: X564_mean <dbl>, X565_mean <dbl>, X592_mean <dbl>,
## # Xred_mean <dbl>, diff_absorbance_mean <dbl>, total_PE_mg_mean <dbl>,
## # PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>, PE_scaled_sd <dbl>,
## # X455_sd <dbl>, X564_sd <dbl>, X565_sd <dbl>, X592_sd <dbl>, Xred_sd <dbl>,
## # diff_absorbance_sd <dbl>, total_PE_mg_sd <dbl>, …
#Make a bunch of graphs 🖨️🖨️🖨️
#final_var <- c("PE_mg_per_gram_sample_mean", "X565_mean", "PE_ug_per_gram_sample_cv", "Xred_mean", "Xred_mean")
graph_histograms_with_error(
data = combined_df,
variables = c("PE_mg_per_g_sample", "Xred"),
id_col = "ID"
)
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Plots saved to folder: histogram_analysis_plots
## $PE_mg_per_g_sample
##
## $Xred
Check effects of enhancement algorithm
PErep_enhanced <- read_csv("E_rep_analy_summary.csv") #retrieve analized replicates from enhanced CV
## Rows: 51 Columns: 85
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): ID, join_ids, replicate_date
## dbl (62): PE_mg_per_g_sample_mean, PE_mg_per_mL_mean, PE_scaled_mean, X455_m...
## num (20): PE_mg_per_g_sample_included_rows, PE_mg_per_mL_included_rows, PE_s...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
PErep_all <- read_csv("all_rep_analy_summary.csv") #retrieve analized replicates without enhancement
## Rows: 51 Columns: 65
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): ID, join_ids, replicate_date
## dbl (62): PE_mg_per_g_sample_mean, PE_mg_per_mL_mean, PE_scaled_mean, X455_m...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Note, this is a product of analyze_replicates using enhanced replicate selection.
SE_change <- PErep_enhanced$PE_mg_per_g_sample_max_dev_pct -PErep_all$PE_mg_per_g_sample_max_dev_pct
print(SE_change)
## [1] 0.0000000 0.0000000 0.0000000 -64.2502175 0.0000000
## [6] -151.0746796 0.0000000 -222.4021594 -123.2779150 -50.8332025
## [11] -56.5204226 -85.1448923 -50.2539810 -67.2754758 -73.3978154
## [16] -52.2745398 -0.1267003 -55.1960482 0.0000000 0.0000000
## [21] -65.7598339 0.0000000 0.0000000 -48.5333937 -43.4997256
## [26] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [31] 0.0000000 0.0000000 -72.5501845 -56.8479013 0.0000000
## [36] 0.0000000 -56.0968444 -70.4508636 -66.0218326 -49.2541002
## [41] -82.4368101 -24.2029729 0.0000000 0.0000000 0.0000000
## [46] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [51] NaN
SE_change <- SE_change[-51] #remove one NaN at the end
paste("# number of replicates SE acted upon:", length(SE_change[SE_change != 0]))
## [1] "# number of replicates SE acted upon: 24"
paste("average improvement by max deviation as a percent of the mean:", abs(mean(SE_change[SE_change != 0], na.rm = TRUE)))
## [1] "average improvement by max deviation as a percent of the mean: 70.32010467602"
joindf_by_id(PErep_enhanced, `Sample data`, "PErep_final.csv", key_df1 = "ID", key_df2 = "ID")
##
## Values needing repetition:
## Lab_27 needs to be repeated
## Lab_28 needs to be repeated
## Lab_29 needs to be repeated
## Lab_32 needs to be repeated
## Lab_47 needs to be repeated
## Lab_52 needs to be repeated
## Lab_53 needs to be repeated
## Lab_54 needs to be repeated
## Lab_55 needs to be repeated
## Lab_56 needs to be repeated
## Lab_57 needs to be repeated
## Lab_58 needs to be repeated
## Lab_59 needs to be repeated
## Lab_60 needs to be repeated
## Lab_61 needs to be repeated
## Lab_62 needs to be repeated
## Lab_63 needs to be repeated
## Lab_64 needs to be repeated
## Join complete. Output saved to: PErep_final.csv
## $result_df
## # A tibble: 51 × 94
## join_id PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 juice fres… 0.217 0.00276 217. 0.0243
## 2 juice fres… 0.362 0.00335 362. 0.0262
## 3 juice ilo 0.0613 0.000624 61.3 0.0103
## 4 Lab_01 0.0229 0.000392 22.9 0.027
## 5 Lab_02 0.00228 0.0000360 2.28 0.0138
## 6 Lab_03 0.0204 0.00044 20.4 0.0163
## 7 Lab_04 0.0236 0.000372 23.6 0.0233
## 8 Lab_05 0.00569 0.000136 5.69 0.0198
## 9 Lab_06 0.0641 0.00127 64.1 0.0523
## 10 Lab_07 0.0735 0.000616 73.5 0.0178
## # ℹ 41 more rows
## # ℹ abbreviated name: ¹PE_mg_per_g_sample_mean
## # ℹ 89 more variables: X564_mean <dbl>, X565_mean <dbl>, X592_mean <dbl>,
## # Xred_mean <dbl>, diff_absorbance_mean <dbl>, total_PE_mg_mean <dbl>,
## # PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>, PE_scaled_sd <dbl>,
## # X455_sd <dbl>, X564_sd <dbl>, X565_sd <dbl>, X592_sd <dbl>, Xred_sd <dbl>,
## # diff_absorbance_sd <dbl>, total_PE_mg_sd <dbl>, …
##
## $merged_cells
## [1] 44
##
## $unmatched_cells
## [1] 18
##
## $unmatched_saved_to
## [1] "unmatched_rows_PErep_final.csv"
##
## $assigned_to_global
## [1] TRUE
graph_replicates_custom_error(
data = PErep_final,
id_col = "join_id",
value_col = "PE_mg_per_g_sample_mean",
se_col = "PE_mg_per_g_sample_se",
output_prefix = "E_rep_analy"
)
## Saving 7 x 5 in image
## Plot saved to: E_rep_analy_plots/PE_mg_per_g_sample_mean_replicates.png
compare_groups(PErep_final, response_var = "PE_mg_per_g_sample_mean", group_var = "Location") #PE data by location
##
## Attaching package: 'rlang'
## The following objects are masked from 'package:purrr':
##
## %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
## flatten_raw, invoke, splice
## Parametric test result (ANOVA):
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 9 0.07231 0.008035 3.511 0.00363 **
## Residuals 34 0.07781 0.002289
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 7 observations deleted due to missingness
## Non-parametric test result (Kruskal-Wallis test):
##
## Kruskal-Wallis rank sum test
##
## data: PE_mg_per_g_sample_mean by Location
## Kruskal-Wallis chi-squared = 17.66, df = 9, p-value = 0.03933
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).
compare_groups(PErep_final, response_var = "Xred_mean", group_var = "Location") #Fluoresence by location
## Parametric test result (ANOVA):
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 9 3.816e+09 423961246 4.744 0.000399 ***
## Residuals 34 3.039e+09 89367983
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 7 observations deleted due to missingness
## Non-parametric test result (Kruskal-Wallis test):
##
## Kruskal-Wallis rank sum test
##
## data: Xred_mean by Location
## Kruskal-Wallis chi-squared = 10.857, df = 9, p-value = 0.2857
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).
compare_groups(PErep_final, response_var = "PE_mg_per_g_sample_mean", group_var = "variety") #PE data by variety
## Parametric test result (t-test):
##
## Welch Two Sample t-test
##
## data: PE_mg_per_g_sample_mean by variety
## t = 2.7436, df = 41.298, p-value = 0.008949
## alternative hypothesis: true difference in means between group C.Cham and group F. Glom is not equal to 0
## 95 percent confidence interval:
## 0.01003143 0.06594116
## sample estimates:
## mean in group C.Cham mean in group F. Glom
## 0.06949904 0.03151275
## Non-parametric test result (Wilcoxon test):
##
## Wilcoxon rank sum exact test
##
## data: PE_mg_per_g_sample_mean by variety
## W = 271, p-value = 0.03735
## alternative hypothesis: true location shift is not equal to 0
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).
compare_groups(PErep_final, response_var = "PE_mg_per_g_sample_mean", group_var = "Life_S") #PE data by life stage
## Parametric test result (ANOVA):
## Df Sum Sq Mean Sq F value Pr(>F)
## Life_S 4 0.00874 0.002184 0.568 0.687
## Residuals 35 0.13453 0.003844
## 11 observations deleted due to missingness
## Non-parametric test result (Kruskal-Wallis test):
##
## Kruskal-Wallis rank sum test
##
## data: PE_mg_per_g_sample_mean by Life_S
## Kruskal-Wallis chi-squared = 4.1876, df = 4, p-value = 0.3812
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_point()`).